Internet scsi communication via undi services

ABSTRACT

A method and system for emulating a hardware Internet Small Computer System Interface (iSCSI) Host Bus Adapter (HBA) without risking an interruption of communication between a computer and a remote secondary storage device is presented. During normal operations, a (hardware emulating) software iSCSI HBA drives a Network Interface Card (NIC) to afford communication between the computer and the remote secondary storage. If an operating system (OS) anomaly occurs in the computer, the NIC is normally disconnected by the OS. To maintain communication between the computer and the secondary storage device if such an event occurs, a failover network device is called up by the computer&#39;s System Management Memory (SMM) Basic Input Output System (BIOS), which allows uninterrupted communication to continue between the computer and remote secondary storage device.

PRIORITY CLAIM

The present application is a continuation of U.S. patent applicationSer. No. 11/127,397 (Atty. Docket No. RPS920050041US1), filed on May 12,2005, and entitled, “Internet SCSI Communication Via UNDI Services,”which is incorporated herein by reference.

BACKGROUND OF THE INVENTION

1. Technical Field

The present invention relates in general to the field of computers, andin particular to network based computers. Still more particularly, thepresent invention relates to a method and system for maintainingsoftware-based Internet Small Computer System Interface (iSCSI)communication between a computer and a secondary storage if an operatingsystem (OS) anomaly in the computer disrupts the operation of an iSCSInetwork interface driver.

2. Description of the Related Art

Two ongoing improvements to modern computers are speed and resourcesharing. Computers such as blade server computers, which have multipleserver blades in a single server chassis, have processors andinput/output (I/O) busses that continue to increase in speed andbandwidth capacity. The same is true for secondary memory devices suchas hard drive arrays. However, devices that allow computers tocommunicate remotely with secondary memory devices often cause a databottleneck.

For example, consider the prior art network topology depicted in FIG. 1.A computer 102 is shown having an operating system (OS) 104, whichincludes a Small Computer System Interface (SCSI) driver 106. SCSIdriver 106 allows data to be put on a SCSI bus (not shown) in computer102, to which can be connected secondary storage devices such as localhard drives (also not shown).

In an effort to promote scalability and resource sharing, computer 102uses Internet SCSI (iSCSI). iSCSI is an Internet Protocol (IP) basedstorage networking standard that has been developed by the InternetEngineering Task Force (IETF), whose current iSCSI standard is hereinincorporated by reference in its entirety. Data destined to a storagedevice on a SCSI bus is wrapped in an IP packet, and sent over theInternet to a remote storage device, which unwraps the IP packet torecover the SCSI commands and data. This function of wrapping andunwrapping SCSI commands and data for computer 102 is performed by iSCSIpackaging software 108 found in a hardware iSCSI Host Bus Adapter (HBA)110, which typically is coupled to a Peripheral Component Interface(PCI) bus 112 in computer 102.

Hardware iSCSI HBA 110's main components include a processor 114 and aNetwork Interface Card (NIC) 116. Processor 114 utilizes instructionsfrom iSCSI packaging software 108 to wrap/unwrap the IP packets, and NIC116 affords communication between computer 102 and a network 118, whichmay be an Ethernet, Internet, or any other network capable of supportingthe IP protocol.

The IP/iSCSI packets are communicated with an iSCSI target 120, which isthe server component of a Storage Area Network (SAN), which includes asecondary memory represented as a Hard Drive Array (HDA) 122. Thus, datato be written to and read from HDA 122 by computer 102 is able to becommunicated via network 118, which allows HDA 122 to be at any remotelocation away from computer 102.

As noted above, hardware iSCSI HBA 110 is a main bottleneck to datatraveling between HDA 122 and computer 102. There are several reasonswhy this is the case, including speed constraints inherent in hardwareiSCSI HBA 110's processor 114 and NIC 116. Thus, there is a need todevelop a method and system that avoids this hardware bottleneck.

SUMMARY OF THE INVENTION

In response to the shortcomings of the prior art system described, thepresent invention is thus directed to a method and system for emulatinga hardware Internet Small Computer System Interface (iSCSI) Host BusAdapter (HBA) without risking an interruption of communication between acomputer and a remote secondary storage device. During normaloperations, a (hardware emulating) software iSCSI HBA drives a NetworkInterface Card (NIC) to afford communication between the computer andthe remote secondary storage. If an operating system (OS) anomaly occursin the computer, the NIC is normally disconnected by the OS. To maintaincommunication between the computer and the secondary storage device ifsuch an event occurs, a failover network device is called up by thecomputer's System Management Memory (SMM) Basic Input Output System(BIOS), which allows uninterrupted communication to continue between thecomputer and remote secondary storage device.

The above, as well as additional purposes, features, and advantages ofthe present invention will become apparent in the following detailedwritten description.

BRIEF DESCRIPTION OF TILE DRAWINGS

The novel features believed characteristic of the invention are setforth in the appended claims. The invention itself, however, as well asa preferred mode of use, further purposes and advantages thereof, willbest be understood by reference to the following detailed description ofan illustrative embodiment when read in conjunction with theaccompanying drawings, where:

FIG. 1 depicts a prior art network topology using a remote hard drivearray for use by a local computer;

FIG. 2 a illustrates a software Host Bus Adapter emulator in the localcomputer shown in FIG. 1;

FIG. 2 b, depicts a failover system used if an anomaly occurs in anoperating system in the local computer, which anomaly causes thesoftware Host Bus Adapter emulator to break communication between theremote hard drive array and the local computer;

FIG. 2 c illustrates additional detail of the operation of the failoversystem shown in FIG. 2 b;

FIG. 3 depicts an exemplary local computer, shown as a blade server, inwhich the present invention can be implemented;

FIG. 4 is a flow-chart of an overview of steps taken in the failoversystem of the present invention;

FIG. 5 is a flow-chart of steps taken in the present invention asdescribed from the perspective of an overall approach of an InternetSmall Computer System Interface (iSCSI) Boot/Page Media (BPM) failover;

FIG. 6 is a flow-chart showing an overview of a Login process for theiSCSI/BPM failover process;

FIG. 7 is a flow-chart showing the iSCSI/BPM failover process in asteady state; and

FIG. 8 is a flow-chart showing the iSCSI/BPM logging out process.

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENT

With reference now to FIG. 2 a, there is depicted a network topologyusing a software Internet Small Computer System Interface (iSCSI) HostBus Adapter (HBA) emulator 202 (HBA emulator 202). HBA emulator 202 usessoftware to emulate the hardware iSCSI HBA 110 described above. HBAemulator 202 includes the ability to package packets of SCSI data, asdescribed above using iSCSI packaging software 108. The iSCSI datapackets (wrapped with an Internet Protocol—IP header) are sent to aNetwork Interface Card (NIC) driver 204, which sends the iSCSI datapackets to network 118 via a NIC 206 in the computer 102. HBA emulator202 is used instead of hardware iSCSI HBA 110.

If computer 102 is using Window® as its operating system 104, NIC driver204 will be shut down at the beginning of an OS anomaly, such as the OSshutting down, the OS receiving an upgrade (or patch), or the OScrashing. Thus, HBA emulator 202 is unable to communicate with NIC 206,leaving the communication pathway between computer 102 and hard drivearray 122 broken. This situation not only interrupts communicationbetween computer 102 and hard drive array 122, but also may result inthe operating system 104 not “knowing” that data sent to HBA emulator202 is not getting passed on to NIC 206. To address this problem, asshown in FIG. 2 b, HBA emulator 202, in response to detecting an OSanomaly, sends a System Management Interrupt (SMI) signal to a BasicInput/Output System (BIOS) iSCSI boot loader 208 found in computer 102'sSystem Management Mode (SMM) BIOS 210. BIOS iSCSI boot loader 208 callsup a Universal Network Device Interface (UNDI) driver 212, which createsa failover link between network 118 and operating system 104 (morespecifically, whatever relevant portions of operating system 104 thatare not affected by the OS anomaly) via NIC 206. The UNDI driver 212then re-enables the NIC 206 to permit communication between computer 102and hard drive array 122.

Further detail of the steps described in FIG. 2 b are shown in FIG. 2 c.First, note that SMM is a reduced power consumption state provided bycertain processors, including some manufactured by Intel®. When a CPU214 enters SMM, CPU 214 saves its current state in computer 102's StaticRandom Access Memory (SRAM) 216 in an area called System Management RAM(SMRAM). CPU 214 then runs, from SMRAM, an SMM handler program, whichcalls the BIOS iSCSI boot loader 208 as described above.

UNDI driver 212 refers to a universal driver that is compliant with theUNDI standard, which provides a hardware-independent and OS-independentmechanism for communicating with a network. UNDI provides a mechanismfor Pre-Boot Execution Environment (PXE) base code to use a NIC fornetwork access without controlling the NIC hardware directly. While UNDIcan be implemented in either hardware or software, in the preferredembodiment of the present invention it is implemented in software in theSMM BIOS 210.

Thus, software iSCSI HBA emulator 202, which runs under operating system104, is bypassed, and communication with iSCSI target 120 is via BIOSiSCSI boot loader 208, which contains resident code for booting iSCSItarget 120 as if it were a local drive. OS driver 224, which representsan active portion of operating system 104 that is not affected by the OSanomaly, thus continues to communicate with hard drive array 122 via thenow (by virtue of the code from the BIOS iSCSI boot loader 208) loadediSCSI target 120. Note that while in a preferred embodiment hard drivearray 122 is an array of hard drives (as the name suggests),alternatively hard drive array 122 may be any secondary memory, which isdefined as any non-volatile memory that cannot be directly processed bya CPU, and includes but is not limited to hard drive arrays, tapedrive(s), optical disk drives, and other similar mass storage devices.

With reference now to FIG. 3, there is depicted a block diagram showingadditional detail of computer 102, which is shown for exemplary purposesas a blade server 302 found in a blade server computer (not shown, butunderstood to be composed of a chassis holding multiple blade servers,each of which have one or more processors). Blade server 302 includes amanagement module 304, which permits coordination of operations amongother blade servers 302 within the blade server computer.

Blade server 302 also includes a processor unit 306, which may be one ormore processors operating in harmony, coupled to a system bus 308. Alsocoupled to system bus 308 is a video adapter 310, which drives/supportsa display 312.

System bus 308 is coupled via a bus bridge 312 to an Input/Output (I/O)bus 314. Coupled to I/O bus 314 is an I/O interface 316, which affordscommunication with various I/O devices, including a keyboard 318, amouse 320, a Compact Disk-Read Only Memory (CD-ROM) drive 322, a floppydisk drive 324, and a flash drive memory 326. The format of the portsconnected to I/O interface 316 may be any known to those skilled in theart of computer architecture, including but not limited to UniversalSerial Bus (USB) ports.

Blade server 302 is able to communicate with network 118 via a networkinterface such as Network Interface Card (NIC) 206 (also shown in FIG. 2b), which is coupled to system bus 308. Network 118 may be a Local AreaNetwork (LAN), or preferably is a Wide Area Network (WAN) such as theInternet.

Also coupled to system bus 308 is a SMM BIOS 210, discussed above inreference to FIG. 2 b, which shows BIOS iSCSI boot loader 208 and UNDIdriver 212. In a preferred embodiment, hard drive 328, along withfirmware such as found in a System Management Module Basic Input/OutputSystem (SMM BIOS) 210 chip, populates a system memory 330, which is alsocoupled to system bus 308. Data that populates system memory 330includes blade server 102's operating system 104, which includes acommand interpreter program known as a shell 332, which is incorporatedin a higher level operating system layer and utilized for providingtransparent user access to resources such as application programs 334.

As is well known in the art, a command interpreter or “shell” isgenerally a program that provides an interpreter and interfaces betweenthe user and the operating system. More specifically, a shell programexecutes commands that are entered into a command line user interface orfrom a file. The shell (UNIX) or command processor (Windows) isgenerally the highest level of the operating system software hierarchyand serves as a command interpreter. The shell typically provides asystem prompt, interprets commands entered by keyboard, mouse, or otheruser input media, and sends the interpreted command(s) to theappropriate lower levels of the operating system (e.g., a kernel 336)for processing.

Exemplary application programs 334 used in the present invention are aweb browser 338 and iSCSI HBA emulator 202 (discussed above). Webbrowser 338 includes program modules and instructions enabling a WorldWide Web (WWW) client (i.e., blade server 302) to send and receivenetwork messages to the Internet using HyperText Transfer Protocol(HTTP) messaging.

Note that the hardware elements depicted in blade server 302 are notintended to be exhaustive, but rather are representative to highlightessential components required by the present invention. For instance,blade server 302 may include alternate memory storage devices such asmagnetic cassettes, Digital Versatile Disks (DVDs), Bernoullicartridges, and the like. These and other variations are intended to bewithin the spirit and scope of the present invention.

With reference now to FIG. 4, a flow-chart showing exemplary steps takenby the present invention is provided. After initiator block 402, thestatus of operating system (OS) network services is monitored (block404). This monitoring includes checking the OS's queue status (includinga representation of whether threads in programs are being properlyhandled) and whether there are any Plug-and-Play (PnP) callbacks(indicating a possible need for a device driver to be loaded and/ordownloaded from a remote location). In addition (block 406), any storageactivity and/or expiration of a pre-determined timer may likewiseindicate an anomaly or event related to the OS has occurred which willcause the OS to initiate a shut-down or at least a quiescent state,which will result in communication using NIC driver 204 to end, asdiscussed above.

A query is made (query block 408), in response to detecting the OSanomaly, asking if OS network services are up. That is, the querydetermines if the OS is initiating a shut-down or quiescent mode. If theOS network services are still up, then all transmissions/receipts ofdata packets (block 410), including those to hard driver array 122described above, are handled normally using NIC driver 204, and theprocess returns to block 404 to continue monitoring the status of the OSnetwork services.

If the network OS services are not up (an anomaly is occurring), theoutbound packets and inbound buffers are prepared (block 412). Pendingpackets are preferably copied to a SMM visible region, and any availablebuffers are updated. Memories are copied and databases are updated toprepare the system for a System Management Interrupt (SMI) call thatdrives the current OS state (contents of registers containing currentinterim state values) to be saved (block 414).

Next, the iSCSI function is started (block 416) by entering into SMMmode as described above. This includes initiating Transmission ControlProtocol (TCP) functionality as well as checking the “receiving” ring(block 418) for incoming data from the hard drive array to the computer,and checking the “transmitting” ring (block 420) for outgoing data fromthe computer to the hard drive array.

After using the failover iSCSI failover system described above in FIG. 2c, a determination is made (block 422) that the original OS services areagain available (the anomaly is over). The Advanced ProgrammableInterrupt Controller (APIC), which handles interrupts from and formultiple CPUs, is cleaned up (block 424) as needed (due to having someinterrupt pending due to UNDI, which must be cleared so that theoriginal OS services to not respond to that interrupt). A call is madeto restore the original OS states (block 426), inbound packets arereconciled (block 428), and a determination is made to see if there areother data requests between the computer and the hard drive array (queryblock 430). If so, then the process returns to block 404. Otherwise, theoriginal OS responds normally to data storage requests from the computerto the hard drive array (block 432), and the process ends (terminatorblock 434).

Referring now to FIG. 5, a flow-chart is presented giving an approachoverview of the presently described iSCSI Boot/Page Media (BPM) failoverapproach. After initiator block 502, a determination is made that the OSis operating normally (block 504), as indicated by the Normal/ProtectedMode notation. Standard Windows®/Linux® transport is occurring forpackets going between the computer and the remote hard drive array. Atimer expires or an anomaly event occurs (block 506), and at leastinitially the iSCSI activities (block 508), continue to functionnormally. However, if a determination is made that the OS networkservices are not up (query block 510), resulting in the loss of thenormal use of the NIC as described above, then the process enters theSystem Management Mode, interrupts are disabled, Information Storage andRetrieval (ISR) registers are checked for events such as any inboundpackets to process, transmission packet rings are set up to handle thelaunching of outbound packets, and iSCSI process (Basic) is initiatedusing the UNDI transport described in FIG. 2 c (block 516). If the OSservices are up, then the iSCSI processing (Rich) network OS transportservices are used in a normal fashion (block 512). In either event,iSCSI cleanup processing (block 514) occurs, including completion oftransmitting and receiving any packets in flight between the computerand the hard drive array, and the process ends (terminator block 518).

With regards to the SMM (System Management Mode) used in the presentinvention, consider the following overview. SMM is a special executionmode that preferably is able to handle a big address mode capable ofaccessing up to 4 GB of memory space (with a default of 1 MB using 16bit op/16 bit segment). Single threaded execution in SMM executes untila return call returns the processor to its previous state. All protectedmode data structures are left intact during interrupts by savingprocessor state to state space upon a system management call. Anyprocessor in a blade or a blade server can enter system SMM upon an SMIinterrupt, so these entries must be coordinated among blades in amulti-blade server chassis. SMM is valuable in the present inventionsince it is an independent and isolated environment from the OS, andthus has no dependencies on OS services. Furthermore, the real modeaddress found in SMM is conducive to UNDI usage due to simple addressingfeatures.

With reference now to FIG. 6, an overview of the Login process of theiSCSI BPM failover process is shown, in which the transition from OSnetwork services to iSCSI UNDI services is initiated. After initiatorblock 602, steps 604 through 608 are similar to steps 504 through 508described in FIG. 5. If not OS network services are up (query block610), then a restart of operations using a Transport Driver Interface(TDI), which is a software interface between the protocols and theApplication Programming Interface (API) layers or the Windows® NTnetwork model, is performed (block 612), and the process ends(terminator block 614). If the OS network services are not up, such asthe iSCSI driver detecting that TDI is down, the kick off failover(block 616) initiates, including preparing to re-login using the UNDItransport, starting a timer to ensure call returns are proper, etc.Thus, when a new login request for data being transported between thecomputer and the hard drive array occurs, an SMI call to the iSCSIprocessing using UNDI transport occurs (block 620) in the SMM. THE UNDIis polled for a response, including jumping into SMM, queryingstatus/receive/transmit quests for iSCSI, etc. A session negotiation islaunched via the SMM UNDI, which returns from the SMM for OS execution.When UNDI determines that that OS anomaly is over, then an RMS returncall returns the operation to the normal (now protected OS mode), andiSCSI cleanup is performed (block 622), ending the process (terminatorblock 614).

FIG. 7 shows the process when iSCSI BPM failover is in steady state.Steps 702 through 714 are similar to those described above as steps 602through 614, and will not be re-described. Note that in normal mode, theiSCSI driver prepares outbound requests and processes inbound requestsfor data to be stored or retrieved from the hard drive array. When UNDIis called (kick off failover—block 716), outbound requests are preparedwith an SMI call to SMM (block 720), which processes inbound responseswith an RMS return (block 722).

FIG. 8 shows the process when iSCSI BPM failover is logging out, thustaking steps to ensure a proper transition back to normal OS operations.Steps 802 through 810 are as described above for steps 602 through 610.If OS network services are not up, then failover is kicked off in SMM(block 812). Once the OS network services are back up, then the rich(full function) OS is restarted using TDI (block 816), the iSCSI sessionis logged out in normal mode (block 818), and normal software iSCSItarget (120 in FIG. 2 c) is logged into by the OS (block 820), thusending the process (terminator block 814).

Note that from the OS network services' perspective, the presentinvention addresses the operation on a given blade. This results inresiliency in the absence of OS network services due to atypicalbehavior on the blade itself, and allows the blade to have redundantpaths to handle OS anomalies. In normal operations, the blade uses thestandard OS network services (TDI/NDIS network transport services inWindows®, newt0 network transport services in Linux®, and rich scope forall Logical Unit Numbers (LUNs) that uniquely identify SCSI busses todistinguish between devices that share the same SCSI bus). Duringanomaly conditions, the computer is able to use the described BIOS/UNDInetwork services during anomalies such as OS shutdowns, OS upgrades andOS crashes. For both Windows® and Linux® systems, BIOS/UNDI services arere-enabled. Because the Boot LUN is used, there is simple threading andsimple memory management.

It should be understood that at least some aspects of the presentinvention may alternatively be implemented in a program product.Programs defining functions on the present invention can be delivered toa data storage system or a computer system via a variety ofsignal-bearing media, which include, without limitation, non-writablestorage media (e.g., CD-ROM), writable storage media (e.g., a floppydiskette, hard disk drive, read/write CD ROM, optical media), andcommunication media, such as computer and telephone networks includingEthernet. It should be understood, therefore in such signal-bearingmedia when carrying or encoding computer readable instructions thatdirect method functions in the present invention, represent alternativeembodiments of the present invention. Further, it is understood that thepresent invention may be implemented by a system having means in theform of hardware, software, or a combination of software and hardware asdescribed herein or their equivalent.

While the invention has been particularly shown and described withreference to a preferred embodiment, it will be understood by thoseskilled in the art that various changes in form and detail may be madetherein without departing from the spirit and scope of the invention.

1. A computer system comprising: a processor; a bus coupled to saidprocessor; a memory coupled to said bus; and a software iSCSI HBAemulator for emulating an Internet Small Computer Interface (iSCSI) HostBus Adapter (HBA) in software to enable a communication of data packetsbetween a computer and a secondary memory via a Network Interface Card(NIC) driver, wherein the NIC driver is disabled in response to ananomaly in an Operating System (OS) in the computer, and wherein, inresponse to the NIC driver being disabled due to the anomaly in the OSin the computer, the emulated iSCSI HBA activates a Universal NetworkDevice Interface (UNDI) driver to re-enable the NIC driver forcommunication via a network between the computer and the secondarymemory.
 2. The system of claim 1, wherein the UNDI driver is activatedby calling a Basic Input/Output System (BIOS) iSCSI boot loader toactivate the UNDI driver.
 3. The system of claim 2, wherein the BIOSiSCSI boot loader is called by a System Management Interrupt (SMI)signal from the emulated iSCSI HBA that is stored within the computer.4. The system of claim 1, wherein the secondary memory is a hard drivearray.
 5. The system of claim 4, wherein the network is an Internet. 6.The system of claim 1, wherein the anomaly in the OS is an initiation ofa shut-down of the OS.